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ABSTRACT 



A multi-layer switch search engine architecture is provided. 
According to one aspect of the present invention, a switch 
fabric includes a search engine, and a packet header pro- 
cessing unit. The search engine may be coupled to a for- 
warding database memory and one or more input ports. The 
search engine is configured to schedule and perform 
accesses to the forwarding database memory and to transfer 
forwarding decisions to the one or more input ports. The 
header processing unit is coupled to the search engine and 
includes an arbitrated interface for coupling to the one or 
more input ports. The header processing unit is configured to 
receive a packet header from one or more of the input ports 
and is further configured to construct a search key for 
accessing the forwarding database memory based upon a 
predetermined portion of the packet header. The predeter- 
mined portion of the packet header is selected based upon a 
packet class with which the packet header is associated. 

23 Claims, 9 Drawing Sheets 
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SEARCH ENGINE ARCHITECTURE FOR A 
HIGH PERFORMANCE MULTI-LAYER 
SWITCH ELEMENT 

FIELD OF THE INVENTION 

The invention relates generally lo the field of computer 
networking devices. More particularly, the invention relates 
to a multi-layer switch search engine architecture. 

BACKGROUND OF THE INVENTION 

Local area networks (LANs) have become quite sophis- 
ticated in architecture. Originally, LANs were thought of a 
single wire connecting a few computers. Today LANs are 
implemented in complicated configurations to enhance func- 
tionality and flexibility. In such a network, packets are 
transmitted from a source device to a destination device; in 
more expansive networks, this packet can travel through one 
or more switches and/or routers. Standards have been set to 
define the packet structure and layers of functionality and 
sophistication of a network. For example, the TCP/IP pro- 
tocol stack defines four distinct multiple layers, e.g. the 
physical layer (layer 1), data link layer (layer 2), network 
layer (layer 3), transport layer (layer 4). A network device 
may be capable of supporting one or more of the layers and 
refer to particular fields of the header accordingly. 

Today, typical LANs utilize a combination of Layer 2 
(data link layer) and Layer 3 (network layer) network 
devices. In order to meet the ever increasing performance 
demands from the network, functionality that has been 
traditionally performed in software and/or in separate layer 
2 and layer 3 devices have migrated into one multi-layer 
device or switch that implements the performance critical 
functions in hardware. 

One of the critical aspects for achieving a cost-effective 
high-performance switch implementation is the architecture 
of the forwarding database search engine, which is the 
centerpiece of every switch design. Therefore, it is desirable 
to optimize partitioning of the functional modules, provide 
efficient interaction between the search engine and its "cli- 
ents" (e.g., switch input ports and the central processing 
unit), and optimize the execution order of events, all of 
which play a crucial role in the overall performance of the 
switching fabric. Also, it is desirable to support diverse 
traffic types and policies by providing flexibility to match 
different packet header fields. Ideally this architecture 
should also allow for a very high level of integration in 
silicon, and linearly scale in performance with the advances 
in silicon technology. 

SUMMARY OF THE INVENTION 

A multi -layer switch search engine architecture is 
described. According to one aspect of the present invention, 
a switch fabric includes a search engine, and a packet header 
processing unit. The search engine may be coupled to a 
forwarding database memory and one or more input ports. 
The search engine is configured to schedule and perform 
accesses to the forwarding database memory and to transfer 
forwarding decisions to the one or more input ports. The 
header processing unit is coupled to the search engine and 
includes an arbitrated interface for coupling to the one or 
more input ports. The header processing unit is configured to 
receive a packet header from one or more of the input ports 
and is further configured to construct a search key for 
accessing the forwarding database memory based upon a 
predetermined portion of the packet header. The predeter- 
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mined portion of the packet header is selected based upon a 
packet class with which the packet header is associated. 

Other features of the present invention will be apparent 
from the accompanying drawings and from the detailed 
S description which follows. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention is illustrated by way of example, 
and not by way of limitation, in the figures of the accorn- 
io panying drawings and in which like reference numerals refer 
to similar elements and in which: 

FIG. 1 illustrates a switch according to one embodiment 
of the present invention. 

FIG. 2 is a simplified block diagram of an exemplary 
15 switch element that may be utilized in the switch of FIG. 1. 
FIG. 3 is a block diagram of the switch fabric of FIG. 2 
according to one embodiment of the present invention. 
FIG. 4 illustrates the portions of a generic packet header 
20 that are operated upon by the pipelined header preprocessing 
subblocks of FIG. 5 according to one embodiment of the 
present invention. 

FIG. 5 illustrates pipelined header preprocessing sub- ^ 
blocks of the header processing logic of FIG. 3 according to 
25 one embodiment of the present invention. 

FIG. 6 illustrates a physical organization of the forward- 
ing memory of FIG. 2 according to one embodiment of the 
present invention. 

FIG. 7 is a flow diagram illustrating the forwarding 
30 database memory search supercycle decision logic accord- 
ing to one embodiment of the present invention. 

FIGS. 8A-C are timing diagrams illustrating three exem- 
plary forwarding database memory search supercycles. 
FIG. 9 is a flow diagram illustrating generalized com- 
35 mand processing for typical forwarding database memory 
access commands according to one embodiment of the 
present invention. 

DETAILED DESCRIPTION 

4 0 A search engine architecture for a high performance 
multi-layer switch element is described. In the following 
description, for the purposes of explanation, numerous spe- 
cific details are set forth in order to provide a thorough 
understanding of the present invention. It will be apparent, 

45 however, to one skilled in the art that the present invention 
may be practiced without some of these specific details. In 
other instances, well-known structures and devices are 
shown in block diagram form. 
The present invention includes various steps, which will 

50 be described below. While the steps of the present invention 
are preferably performed by the hardware components 
described below, the steps may alternatively be embodied in 
machine -executable instructions, which may be used to 
cause a general-purpose or special-purpose processor pro- 

55 grammed with the instructions to perform the steps. Further, 
embodiments of the present invention will be described with 
reference to a high speed Ethernet switch employing a 
combination of random access memory (RAM) and content 
addressable memories (CAMs). However, the method and 

60 apparatus described herein are equally applicable to other 
types of network devices such as repeaters, bridges, routers, 
brouters, and other network devices and also alternative 
memory types and arrangements. 

65 AN EXEMPLARY NETWORK ELEMENT 

An overview of one embodiment of a network element 
that operates in accordance with the teachings of the present 
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invention is illustrated in FIG. 1. The network element is 
used to interconnect a number of nodes and end-stations in 
a variety of different ways. In particular, an application of 
the multi-layer distributed network element (MLDNE) 
would be to route packets according to predefined routing 5 
protocols over a homogenous data link layer such as the 
IEEE 802.3 standard, also known as the Ethernet. Other 
routing protocols can also be used. 

The MLDNE' s distributed architecture can be configured 
to route message traffic in accordance with a number of 10 
known or future routing algorithms. In a preferred 
embodiment, the MLDNE is configured to handle message 
traffic using the Internet suite of protocols, and more spe- 
cifically the Transmission Control Protocol (TCP) and the 
Internet Protocol (IP) over the Ethernet LAN standard and 15 
medium access control (MAC) data link layer. The TCP is 
also referred to here as a Layer 4 protocol, while the IP is 
referred to repeatedly as a Layer 3 protocol. 

In one embodiment of the MLDNE, a network element is 
configured to implement packet routing functions in a dis- 20 
tributed manner, i.e., different parts of a function are per- 
formed by different subsystems in the MLDNE, while the 
final result of the functions remains transparent to the 
external nodes and end-stations. As will be appreciated from 
the discussion below and the diagram in FIG. 1, the MLDNE 25 
has a scalable architecture which allows the designer to 
predictably increase the number of external connections by 
adding additional subsystems, thereby allowing greater flex- 
ibility in defining the MLDNE as a stand alone router. 

As illustrated in block diagram form in FIG. 1, the 30 
MLDNE 101 contains a number of subsystems 110 that are 
fully meshed and interconnected using a number of internal 
links 141 to create a larger switch. At least one internal link 
couples any two subsystems. Each subsystem 110 includes 
a switch element 100 coupled to a forwarding and filtering 35 
database 140, also referred to as a forwarding database. The 
forwarding and filtering database may include a forwarding 
memory 113 and an associated memory 114. The forwarding 
memory (or database) 113 stores an address table used for 
matching with the headers of received packets. The associ- 4 ° 
ated memory (or database) stores data associated with each 
entry in the forwarding memory that is used to identify 
forwarding attributes for forwarding the packets through the 
MLDNE. A number of external ports (not shown) having 
input and output capability interface the external connec- 45 
tions 117. In one embodiment, each subsystem supports 
multiple Gigabit Ethernet ports, Fast Ethernet ports and 
Ethernet ports. Internal ports (not shown) also having input 
and output capability in each subsystem couple the internal 
links 141. Using the internal links, the MLDNE can connect 50 
multiple switching elements together to form a multigigabit 
switch. 

The MLDNE 101 further includes a central processing 
system (CPS) 160 that is coupled to the individual sub- 
system 110 through a communication bus 151 such as the 55 
peripheral components interconnect (PCI). The CPS 160 
includes a central processing unit (CPU) 161 coupled to a 
central memory 163. Central memory 163 includes a copy of 
the entries contained in the individual forwarding memories 
113 of the various subsystems. The CPS has a direct control 60 
and communication interface to each subsystem 110 and 
provides some centralized communication and control 
between switch elements. 

AN EXEMPLARY SWITCH ELEMENT 65 

FIG. 2 is a simplified block diagram illustrating an 
exemplary architecture of the switch element of FIG. 1. The 
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switch element 100 depicted includes a central processing 
unit (CPU) interface 215, a switch fabric block 210, a 
network interface 205, a cascading interface 225, and a 
shared memory manager 220, 

Ethernet packets may enter or leave the network switch 
element 100 through any one of the three interfaces 205, 
215, or 225. In brief, the network interface 205 operates in 
accordance with a corresponding Ethernet protocol to 
receive Ethernet packets from a network (not shown) and to 
transmit Ethernet packets onto the network via one or more 
external ports (not shown). An optional cascading interface 
225 may include one or more internal links (not shown) for 
interconnecting switching elements to create larger 
switches. For example, each switch element 100 may be 
connected together with other switch elements in a full mesh 
topology to form a multi-layer switch as described above. 
Alternatively, a switch may comprise a single switch ele- 
ment 100 with or without the cascading interface 225. 

The CPU 161 may transmit commands or packets to the 
network switch element 100 via the CPU interface 215. In 
this manner, one or more software processes running on the 
CPU 161 may manage entries in an external forwarding and 
filtering database 140, such as adding new entries and 
invalidating unwanted entries. In alternative embodiments, 
however, the CPU 161 may be provided with direct access 
to the forwarding and filtering database 140. In any event, 
for purposes of packet forwarding, the CPU port of the CPU 
interface 215 resembles a generic input port into the switch 
element 100 and may be treated as if it were simply another 
external network interface port. However, since access to the 
CPU port occurs over a bus such as a peripheral components 
interconnect (PCI) bus, the CPU port does not need any 
media access control (MAC) functionality. 

Returning to the network interface 205, the two main 
tasks of input packet processing and output packet process- 
ing will now briefly be described. Input packet processing 
may be performed by one or more input ports of the network 
interface 205. Input packet processing includes the follow- 
ing: (1) receiving and verifying incoming Ethernet packets, 
(2) modifying packet headers when appropriate, (3) request- 
ing buffer pointers from the shared memory manager 220 for 
storage of incoming packets, (4) requesting forwarding 
decisions from the switch fabric block 210, (5) transferring 
the incoming packet data to the shared memory manager 220 
for temporary storage in an external shared memory 230, 
and (5) upon receipt of a forwarding decision, forwarding 
the buffer pointers) to the output port(s) indicated by the 
forwarding decision. Output packet processing may be per- 
formed by one or more output ports of the network interface 
205. Output processing includes requesting packet data from 
the shared memory manager 220, transmitting packets onto 
the network, and requesting deallocation of buffers) after 
packets have been transmitted. 

The network interface 205, the CPU interface 215, and the 
cascading interface 225 are coupled to the shared memory 
manager 220 and the switch fabric block 210. Preferably, 
critical functions such as packet forwarding and packet 
buffering are centralized as shown in FIG. 2. The shared 
memory manager 220 provides an efficient centralized inter- 
face to the external- shared memory 230 for buffering of 
incoming packets. The switch fabric block 210 includes a 
search engine and learning logic for searching and main- 
taining the forwarding and filtering database 140 with the 
assistance of the CPU 161. 

The centralized switch fabric block 210 includes a search 
engine that provides access to the forwarding and filtering 
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database 140 on behalf of the interfaces 205, 215, and 225. (3) Hdr_Bus[X:l][N:0]— The Dedicated Header Bus 

Packet header matching, Layer 2 based learning, Layer 2 The header bus is a dedicated X-bit wide bus from each 

and Layer 3 packet forwarding, filtering, and aging are input port to the switch fabric 210. In one embodiment, X is 

exemplary functions that may be performed by the switch 16> thereb a^^g tne packet header to be transferred as 

fabric block 210. Each input port is coupled with the switch 5 d 0UD i e bvtes 

fabric block 210 to receive forwarding decisions for ,^ « , \ , r^ T ^ ^ • • 

received packets. The forwarding decision indicates the (4) Fwdy\ck[N:0]-Forwardmg Deepen Acknowledg- 

outbound port(s) (e.g., external network port or internal men ^° s 

cascading port) upon which the corresponding packet should forwarding decision acknowledgment signals are 
be transmitted. Additional information may also be included 10 generated by the switch fabric 210 in response to corre- 
in the forwarding decision to support hardware routing such sponding forwarding request signals from the input ports 
as a new MAC destination address (DA) for MAC DA ( see Fwd__Req[N:0] above). These signals are deasserted 
replacement. Further, a priority indication may also be while me forwarding decision is not ready. When a forward- 
included in the forwarding decision to facilitate prioritiza- m S decision acknowledgment signal does become asserted, 
tion of packet traffic through the switch element 100. 15 ^ corresponding input port should assume the forwarding 
In the present embodiment, Ethernet packets are centrally decision bus (see Fwd Decision[Y:0] below) has a valid 
buffered and managed by the shared memory manager 220. folding decisioa After detecting its forwarding decision 
The shared memory manager 220 interfaces every input port acknowledgment, the corresponding input port may make 
and output port and performs dynamic memory allocation ^ ^rw^rdmg request, if needed, 
and deallocation on their behalf, respectively. During input 2 o ( 5 ) Fwd_Decision[Y:0]— Shared Forwarding Decision 
packet processing, one or more buffers are allocated in the Bus 

external shared memory 230 and an incoming packet is This forwarding decision bus is shared by all input ports, 

stored by the shared memory manager 220 responsive to It indicates the output port numbers) on which to forward 

commands received from the network interface 205, for the packet. The forwarding decision may also include data 

example. Subsequently, during output packet processing, the 25 indicative of the outgoing packet's priority, VID insertion, 

shared memory manager 220 retrieves the packet from the DA replacement, and other information that may be useful 

external shared memory 230 and deallocates buffers that are to the input ports. 

no longer in use. To assure no buffers are released until all cAi/rr^u r AD n T p rwr^nxn^m 

* * * u 1 4 j * ■ • r *i_ j * * j SWITCH FABRIC OVERVIEW 
output ports have completed transmission of the data stored 

therein, the shared memory manager 220 preferably also 30 Having described the interface between the input ports 

tracks buffer ownership. and the switch fabric 210, the internal details of the switch 

INPUT PORT/SWITCH FABRIC INTERFACE ™>™ ™ W " 1 T be Ref "f g f !° ft 3 ' .* 

block diagram of an exemplary switch fabric 210 is 

Before describing the internal details of the switch fabric depicted. In general, the switch fabric 210 is responsible for 

210, the interface between the input ports (e.g., any port on 35 directing packets from an input port to an output port. The 

which packets may be received) and the switch fabric 210 goa i 0 f the switch fabric 210 is to generate forwarding 

will now briefly be discussed. Input ports in each of the CPU decisions to the input ports in the shortest time possible to 

interface 215, the network interface 205, and the cascading keep the delay though the switch low and to achieve wire 

interface 225 request forwarding decisions for incoming speed switching on all ports. Hie primary functions of the 

packets from the switch fabric 210. According to one AQ switch fabric are performing real-time packet header 

embodiment of the present invention, the following interface matching, Layer 2 (L2) based learning, 12 and Layer 3 (L3) 

is employed: aging, forming L2 and L3 search keys for searching and 

(1) Fwd_Req[N:0] — Forward Request Signals retrieving forwarding information from the forwarding data- 
These forward request signals are output by the input base memory 140 on behalf of the input ports, and providing 

ports to the switch fabric 210. They have two purposes. First, 45 a command interface for software to efficiently manage 
they serve as an indication to the switch fabric 210 that the entries in the forwarding database memory 140. 
corresponding input port has received a valid packet header Layer 2 based learning is the process of constantly 
and is ready to stream the packet header to the switch fabric. updating the MAC address portion of the forwarding data- 
A header transfer grant signal (see Hdr_Xfr_Gnt[N:0] base 140 based on me traffic mat passes through the switch- 
below) is expected to be asserted before transfer of the 50 ing device. When a packet enters the switching device, an 
packet header will begin. Second, these signals serve as a entry is created (or an existing entry is updated) in the 
request for a forwarding decision after the header transfer database that correlates the MAC source address (SA) of the 
grant is detected. The forward request signals are deasserted packet with the input port upon which the packet arrived. In 
in the clock period after a forwarding decision acknowledg- this manner, a switching device "learns" on which subnet a 
ment is detected from the switch fabric 210 (see Fwd _^Ack 55 node resides. 

[N:0] below). Aging is carried out on both link and network layers. It is 

(2) Hdr_ - Xfr_Gnt[N:0] — Header Transfer Grant Signals the process of time stamping entries and removing expired 
These header transfer grant signals are output by the entries from the forwarding database memory 140. There are 

switch fabric 210 to the input ports. More specifically, these two types of aging: (1) aging based on MAC SA, and (2) 

signals are output by the switch fabric's header preprocess- 60 aging based on MAC destination address (DA). The former 

ing logic that will be described further below. At any rate, is for Layer 2 aging and the latter aids in removal of inactive 

the header transfer signal indicates the header preprocessing Layer 3 flows. Thus, aging helps reclaim inactive flow space 

logic is ready to accept the packet header from the corre- for new flows. At predetermined time intervals, an aging 

sponding input port. Upon detecting the assertion of the field is set in the forwarding database entries. Entries that are 

header transfer grant, the corresponding input port will 65 found during MAC SA or MAC DA searching will have 

begin streaming continuous header fields to the switch fabric their aging fields cleared. Thus, active entries will have an 

210. aged bit set to zero, for example. Periodically, software or 
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hardware may remove the inactive (expired) entries from the The search engine 370 is coupled to the forwarding 

forwarding database memory 140; thereby allowing for database memory interface 310 for making search requests 

more efficient database management. Aging also enables and to the header preprocessing logic 305 for information 

connectivity restoration to a node that has "moved and kept f or generating search keys. The search engine 370 is also 

silent" since it was learned. Such a node can only be reached 5 coupled to the learning logic 350 to trigger the learning 

through flooding. processing. The search engine 370 contains logic for sched- 

Before discussing the exemplary logic for performing ufog an d performing accesses into the forwarding database 

search key formation, the process of search key formation menl ory 140 and executes the forward and filter algorithm 

will now briefly be described According to one embodiment including performing search key formation, merging L2 and 

of the present invention, packets are broadly categorized in 10 u results rctricvcd from the forwardin database m 

one of two groups either entries or U entries. The 13 140 fiUeri d atin forwarding decisions to the 

entries may be further classified as being part of one of ' . . & . _f . r ?, . , A , 

several header classes.- Exemplary header classes include: ^ f • F ° r of learning, updated 

(1) an Address Resolution Protocol (ARP) class indicating forwarding database entry information such as a cleared age 

the packet header is associated with an ARP packet; (2) a ^ ? r ? ° UtpUt ?T! > ^ V u V 

reverse ARP (RARP) class indicating the packet header is 15 0glC 3 » at * e a PP ro P« ate time during the searching cycle 

associated with a RARP packet; (3) a PIM class indicating for U P^ e ° f * e f °Tl ^ "T^ u 7' 

the packet header is associated with a PIM packet; (4) a * s ^ be below > wh f s K earch results 

Reservation Protocol (RSVP) class indicating the packet avail * ble from forwardul g ^ry 

header is associated with an RSVP packet; (5) an Internet 140 ' ihe f arch 370 S< DeratCS and * ransfcrs a for " 

Group Management Protocol (IGMP) class indicating the 20 Wardm * deC1S10n t0 ^ P ort - 

packet header is associated with a IGMP packet; (6) a ^ forwarding database memory interface 310 accepts 

Transmission Control Protocol (TCP) flow class indicating and arbitrates access requests to the forwarding database 

the packet header is associated with a TCP packet; (7) a memory 140 from the search engine 370 and the software 

non-fragmented User Datagram Protocol (UDP) flow class command execution block 340. 

indicating the packet header is associated with a non- The software command execution block 340 is coupled to 
fragmented UDP packet; (8) a fragmented UDP flow class the CPU bus. Programmable command, status, and internal 
indicating the packet header is associated with a fragmented registers may be provided in the software command execu- 
UDP packet; (9) a hardware routable Internet Protocol (IP) tion block 340 for exchanging information with the CPU 
class indicating the packet header is associated with a 161- Importantly, by providing a relatively small command 
hardware routable IP packet; and (10) an IP version six (IP 30 set to the CPU, the switch fabric 210 shields the CPU from 
V6) class indicating the packet header is associated with an the tens or hundreds of low-level instructions that may be 
IP V6 packet. required depending upon the forwarding database memory 
In one embodiment of the present invention, search keys implementation. For example, in an architecture providing 
are formed based upon an encoding of the header class and 5 mc cpu wi^ direct access to a content addressable memory, 
selected information from the incoming packet's header. 12 for example, a great deal of additional software would be 
search keys may be formed based upon the header class, the required to access the forwarding database memory. This 
L2 address and the VID. L3 search keys may be formed additional software would be unnecessarily redundant, in 
based upon the header class, an input port list, and selectable light of the fact that the switch fabric 210 already has 
L3 header fields based upon the header class, for example. 4Q knowledge of the forwarding database memory 140 inter- 
Masks may be provided on a per header class basis in local ^ ace - 

switch element 100 memory to facilitate the header field Additional efficiency considerations are also addressed by 

selection, in one embodiment. the present invention with respect to architectures having 

In the embodiment depicted in FIG. 3, the switch fabric distributed forwarding databases. For example, in a distrib- 

210 includes a header preprocess arbiter 360, packet header 45 uted architecture, it may be desirable to keep an image of the 

preprocessing logic 305, a search engine 370, learning logic entire forwarding database in software. If this is the case, 

350, a software command execution block 340, and a presumably, periodically the software will need to read all 

forwarding database memory interface 310. entries from each of the individual forwarding databases. 

The header preprocess arbiter 360 is coupled to the packet Since the forwarding database(s) may be very large, many 

header preprocessing logic 305 and to the input ports of the 50 inefficient programmed input/outputs (PIOs) may be 

network interface 205, the cascading interface 225, and the required by an architecture providing the CPU with direct 

CPU interface 215. The input ports transfer packet headers access to the forwarding database^), 

to the switch fabric 210 and request forwarding decisions in Thus, it would be advantageous to employ the switch 

the manner described above, for example. fabric 210 as an intermediary between the CPU 161 and the 

The switch fabric 210 may support mixed port speeds by 55 forwarding database 140 as discussed herein. According to 

giving priority to the faster network links. For example, the one embodiment of the present invention, the software 

header preprocess arbiter 360 may be configured to arbitrate command execution block 340 may provide a predetermined 

between the forwarding requests in a prioritized round robin set of commands to the software for efficient access to and 

fashion giving priority to the faster interfaces by servicing maintenance of the forwarding database memory 140. The 

each fast interface (e.g., Gigabit Ethernet port) for each N 60 predetermined set of commands described below have been 

slower interfaces (e.g., Fast Ethernet ports). defined in such a way so as to reduce overall PIOs, These 

Upon selecting a forward request to service, the header commands as well as the programmable registers will be 

preprocess arbiter 360 transfers the corresponding packet discussed in further detail below. 

header to the header preprocess logic 305. The header An exemplary set of registers includes the following: (1) 
preprocessing logic 305 performs L2 encapsulation filtering 65 a command and status register for receiving commands from 
and alignment, and L3 header comparison and selection the CPU 161 and indicating the status of a pending corn- 
logic, mand; (2) a write new entry register for temporarily storing 
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a new entry to be written to the forwarding database 140; (3) 510 may be processing the L2 header portion 475 of a packet 

a write key register for storing the key used to locate the from a first input port, the encapsulation block 520 may be 

appropriate forwarding database entry; (4) a write data processing the L2 encapsulation portion 480 of a packet 

register for storing data to be written to the forwarding from a second input port, the L3 header class matching block 

database 140; (5) an address counter register for storing the 5 530 may be processing the L3 address independent portion 

location in the forwarding database memory to read or 485 of a third input port, and the 13 address dependent block 

update; (6) a read entry register for storing the results of a 540 may be processing the L3 address dependent portion 

read entry operation; and (7) a read data register for storing 490 of a packet from a forth input port, 

the results of other read operations. Importantly, while the present embodiment is illustrated 

In one embodiment of the present invention, an address 10 with reference to four pipeline stages, it is appreciated that 

counter register is used to facilitate access to the forwarding more or less stages may be employed and different group - 

database memory 140. The software only needs to program ings of packet header information may be used. The present 

the address register with the start address of a sequence of identification of header portions depicted in FIG. 4 has been 

reads/writes prior to the initial read/write of the sequence. selected for convenience. The boundaries for these header 

After the initial memory access, the address register will be 15 portions 475-490 are readily identifiable based upon known 

automatically incremented for subsequent accesses. characteristics of the fields within each of the exemplary 

Advantageously, in this manner, additional PIOs are saved, header portions 475-490. Further, the header portions 

because the software is not required to update the address 475-490 can be processed in approximately equal times, 

prior to each memory access. £ n my evcnl) continuing with the present example, the 

The software command execution block 340 is further 20 arbiters 501-504 coordinate access to the stages of the 

coupled to the forwarding database memory interface 310. pipeline. The arbiters 501-504 function so as to cause a 

Commands and data are read from the programmable reg- given packet to be sequentially processed one stage at a time 

isters by the software command execution block 340 and starting with the address accumulation block 510 and ending 

appropriate forwarding database memory access requests with the L3 address dependent block 540. The first stage of 

and events are generated as described in further detail with 25 the pipeline, the address accumulation block 510, is config- 

reference to FIG. 9. The software command execution block ured to extract the MAC SA and MAC DA from the 12 

340 may also provide status of the commands back to the header portion 475 of the packet header. The address accu- 

software via status registers. In this manner, the software mulation block 510 then transfers the extracted information 

command execution block 340 provides hardware assisted to the search engine for use as part of the L2 search key 545. 

CPU access to the forwarding database memory 140. 30 The encapsulation block 520 is configured to determine 

papitft HFAnPR PRnrP^TTsrn the type of eDCa P sulatioD of the U encapsulation portion 

rALKbl HLADbK FKULL^ITMO 48Q Qf ^ packe( header ^ mdkated above? me rektive 

FIG. 4 illustrates the portions of a generic packet header positioning of fields following the L2 encapsulation portion 

that are operated upon by the pipelined header preprocessing 3S varies depending upon the type of encapsulation employed, 

subblocks of FIG. 5 according to one embodiment of the Therefore, the encapsulation block further calculates an 

present invention. According to this embodiment, a packet offset from the start of the L2 encapsulation portion 480 to 

header 499 is partitioned into four portions, an 12 header the start of the L3 address independent portion 485. The 

portion 475, an 12 encapsulation portion 480, an L3 address offset may then be used by the subsequent stages to align the 

independent portion 485, and an L3 address dependent ^ packet header appropriately. 

portion 490. The L3 header class matching block 530 is configured to 

In this example, the 12 header portion 475 may comprise determine the class of the L3 header by comparing the 

a MAC S A field and a MAC DA field. Depending upon the packet header to a plurality of programmable registers that 

type of encapsulation (e.g., IEEE 802.1 Q tagged or LLC- may contain predetermined values known to facilitate iden- 

SNAP), the L2 encapsulation portion may include a virtual 45 tificationof the L3 header class. Each programmable register 

local area network (VLAN) tag or an 802.3 type/length field should be set such that only one header class will match for 

and an LLC SNAP field. The L3 address independent any given packet. Once a given register has been determined 

portion 485 may comprise an IP flags/fragment offset field to match, a class code is output to the search engine for use 

and a protocol field. Finally, the L3 address dependent as part of the L3 search key. 

portion 490 may comprise an IP source field, an IP desti- 50 The L3 address dependent block 540 is configured to 

nation field, a TCP source port, and a TCP destination port. extract. appropriate bytes of the L3 address dependent por- 

Note that the relative position of fields in the L3 address tio n 490 for use in the L3 search key 555. This extraction 

independent portion 485 and the L3 address dependent m ay be performed by employing M CPU programmable 

portion 490 may be different depending upon the type of byte and bit masks, for example. The programmable byte 

encapsulation in the L2 encapsulation portion 480. 55 and bit mask corresponding to the header class, determined 

FIG. 5 illustrates pipelined header preprocessing sub- by the L3 header class matching block 530, may be used to 

blocks according to one embodiment of the present inven- mask off the desired fields. Advantageously, pipelining the 

tion. According to this embodiment, the header preprocess- header preprocess logic 305 saves hardware implementation 

ing logic 305 may be implemented as a four stage pipeline. overhead. For example, multiple packet headers may be 

Each stage in the pipeline operates on a corresponding 60 processed simultaneously in a single processing block rather 

portion of the packet header 499. The pipeline depicted than four processing blocks that would typically be required 

includes four stage arbiters 501-504, an address accumula- to implement the logic of FIG. 5 in a non-pipelined fashion, 

tion block 510, an encapsulation block 520, an L3 header Note that additional parallelism may be achieved by, further 

class matching block 530, and an L3 address dependent pipelining the above header preprocessing with forwarding 

block 540. In this example, the header preprocessing logic 65 database memory 140 accesses. For example, there is no 

305 may simultaneously process packet headers from four need for L2 searching to wait for a packet to complete the 

input ports. For example, the address accumulation block pipeline of FIG. 5, L2 searches may be initiated as soon as 
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a packet header completes the first stage and an L2 search 
key becomes available from the search engine 370. Subse- 
quent L2 searches may be initiated as new L2 search keys 
become available and after the previous forwarding database 
memory access has completed. 

FORWARDING DATABASE MEMORY 

FIG. 6 illustrates a physical organization of the forward- 
ing database memory of FIG. 2 according to one embodi- 
ment of the present invention. In the embodiment depicted, 
the forwarding database memory 140 includes two cascaded 
fully associative content addressable memories (CAMs), 
610 and 620, and a static random access memory (SRAM) 
630. 

The switch fabric 210, in collaboration with the CPU 161, 
maintains a combined link layer (also referred to, as "Layer 
2") and network layer (also referred to as "Layer 3") packet 
header field-based forwarding and filtering database 140. 
The forwarding and filtering database 140 is stored primarily 
in off-chip memory (e.g., one or more CAMs and SRAM) 
and contains information for making real-time packet for- 
warding and filtering decisions. 

The assignee of the present invention has found it advan- 
tageous to physically group Layer 2 (L2) entries and Layer 
3 (L3) entries together. Therefore, at times the group of L2 
entries may be referred to as the "L2 database" and the group 
of L3 entries may be logically referred to as the "L3 
database." However, it is important to note that the L2 
database and L3 database may span CAMs. That is, either 
CAM may contain L2 and/or L3 entries. Both Layer 2 and 
Layer 3 forwarding databases are stored in the CAM-RAM 
chip set. For convenience, the data contained in the CAM 
portion of the forwarding database memory 140 will be 
referred to as "associative data," while the data contained in 
the SRAM portion of the forwarding database memory 140 
will be referred to as "associated data." 

As will be explained further below, entries may be 
retrieved from the L2 database using a key of a first size and 
entries may be retrieved from the L3 database using a key of 
a second size. Therefore, in one embodiment, the switching 
element 100 may mix CAMs of different widths. Regardless 
of the composition of the forwarding database memory 140, 
the logical view to the switch fabric 210 and the CPU 161 
should be a contiguous memory that accepts bit match 
operations of at least two different sizes, where all or part of 
the memory is as wide as the largest bit match operation. 

Different combinations of CAMs are contemplated. 
CAMs of different widths, and different internal structures 
(e.g., mask per bit (MPB) vs. global mask) may be 
employed. In some embodiments, both CAMs 610 and 620 
may be the same width, while in other embodiments the 
CAMs 610 and 620 may have different widths. For example, 
in one embodiment, both CAMs 610 and 620 may be 
128-bits wide and 2K deep or the first CAM 610 may be 
128-bits wide and the second CAM. 620 may be 64-bits 
wide. Since L2 entries are typically narrower than L3 
entries, in the mixed CAM width embodiments, it may be 
advantageous to optimize the narrower CAM width for L2 
entries. In this case, however, only L2 entries can be stored 
in the narrower CAM. However, both 12 and L3 entries may 
still reside in the wider CAM. 

While the present embodiment has been described with 
reference to cascaded dual CAMs 610 and 620, because the 
logical view is one contiguous block, it is appreciated that 
the L2 and L3 databases may use more or less CAMs than 
depicted above. For example, the L2 and L3 databases may 
be combined in a single memory in alternative embodi- 
ments. 
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Having described an exemplary physical organization of 
the forwarding database memory 140, the data contained 
therein will now briefly be described. One or more lines of 
the SRAM 630 may be associated with each entry in the 
CAM portion. It should be noted that a portion of the CAM 
could have been used as RAM. However, one of the goals 
of partitioning the associative data and the associated data is 
to produce a minimum set of associative data for effective 
searching while storing the rest of the associated data in a 
separate memory, a cheaper RAM, for example. As will be 
discussed below, with respect to FIGS. 8A-C, separating the 
associative data and the associated data allows the forward- 
ing database memory 140 to be more efficiently searched 
and updated. Additional advantages are achieved with an 
efficient partitioning between associative data and associ- 
ated data. For example, by minimizing the amount of data in 
the associative data fields, less time and resources are 
required for access and maintenance of the forwarding 
database such as the occasional shuffling of L3 entries that 
may be performed by the CPU 161. Additionally, the effi- 
cient partitioning reduces- the amount of time required for 
the occasional snap shots that may be taken of the entire 
forwarding database for maintenance of the aggregate copy 
of forwarding databases in the central memory 163. 

Generally, the associative data is the data with which the 
search key is matched. Packet address information is typi- 
cally useful for this purpose. In one embodiment, the asso- 
ciative data may contain one or more of the following fields 
depending upon the type of entry (e.g., L2 or L3): 

(1) a class field indicating the type of associative entry; 

(2) a media access control (MAC) address which can be 
matched to an incoming packet's MAC DA or SAfield; 

(3) a virtual local area network (VLAN) identifier (VID) 
field 

(4) an Internet Protocol (IP) destination address; 

(5) an IP source address; 

(6) a destination port number for TCP or non-fragmented 
UDP flows; 

(7) a source port number for TCP or non-fragmented UDP 
flows; and 

(8) an input port list for supporting efficient multicast 
routing. 

The associative data may also contain variable bits of the 
above by employing a mask per bit (MPB) CAM as 
described above. 

The associated data generally contains information such 
as an indication of the output port(s) to which the packet 
may be forwarded, control bits, information to keep track of 
the activeness of the source and destination nodes, etc. Also, 
the associated data includes the MAC address for MAC DA 
replacement and the VID for tagging. Specifically, the 
associated data may contain one or more of the following 
fields: 

(1) a port mask indicating the set of one or more ports the 
packet may be forwarded to; 

(2) a priority field for priority tagging and priority queu- 
ing. 

(3) a best effort mask indicating which ports should queue 
the packet as best effort; 

(4) a header only field indicating that only the packet 
header should be transferred to the CPU; 

(5) a multicast route field for activating multicast routing; 

(6) a next hop destination address field defining the next 
hop L2 DA to be used to replace the original DA; 
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(7) a new V1D field that may be used as a new tag for the 
packet when routing between VLANs requires an out- 
going tag different than the incoming tag, for example; 

(8) a new tag field indicating that the new VID field 
should be used; 

(9) an aged source indication for determining which L2 
entries are active in the forwarding database, and which 
may be removed; 

(10) an aged destination indication for implementing 
IEEE 802.1 d type address aging to determine which L2 
or L3 entries are active in the forwarding database, and 
which may be removed. 

(11) an L2 override indication for instructing the merge 
function to use the L2 result for forwarding even when 
an L3 result is available; 

(12) a static indication for identifying static entries in the 
forwarding database that are not subject to automatic 
12 learning or aging; 

(13) a distributed flow indication for use over internal 
(cascading) links to control the type of matching cycle 
(L2 or L3) used on the next switching element; and 

(14) a flow rate count for estimating the arrival rate of an 
entry or group of entries. 

FORWARDING DATABASE SEARCH 
SUPERCYCLE DECISION FLOW 

FIG. 7 is a flow diagram illustrating the forwarding 
database memory search supercycle decision logic accord- 
ing to one embodiment of the present invention. At step 702, 
depending upon whether the packet is being received on an 
internal link or an external link, processing continues with 
step 704 or step 706, respectively. 

Internal link specific processing includes steps 704, 712, 
714, 720, 722, and 724. At step 704, since the packet has 
been received from an internal link, a check is performed to 
determine if the packet is part of a distributed flow. If so, 
processing continues with step 714. If the packet is not part 
of a distributed flow, then processing continues with step 
712. 

No learning is performed for the internal links, therefore, 
at step 712, only a DA search is performed on the forwarding 
database memory 140 

At step 714, an L3 search is performed to retrieve a 
forwarding decision for the incoming packet. At step 720, a 
determination is made as to whether a matching L3 entry 
was found during the search of step 714. If not, then, at step 
722, the class action defaults are applied (e.g., forwarding 
the packet or the packet header to the CPU 161) and 
processing continues at step 780. If a matching L3 was 
found, then, at step 724, the associated data corresponding 
to the matching entry is read from the forwarding database 
140 and processing continues at step 780. 

At step 708, Layer 2 learning is performed. After the 
learning cycle the header class is determined and, at step 
716, the header class is compared against the L3 unicast 
route header class. If there is a match at step 716, processing 
continues with step 726; otherwise, another test is performed 
at step 718. At step 718, the header class is compared to the 
remaining L3 header classes. 

Specific processing for packets associated with headers 
classified as L2 includes steps 728 and 738. If the header 
class was determined not to be an L3 header class, then at 
step 728, a DA search is performed for an L2 forwarding 
decision. At step 738, the L2 decision algorithm is applied 
and processing continues at step 780. 
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Specific processing for packets associated with headers 
classified as L3 route includes steps 726, 732, 734, 736, 748, 
750, 754, 756, 752, 758, and 760. At step 726, an L3 search 
is performed on the forwarding database 140. If a matching 
5 L3 entry is found (step 732), then the associated data 
corresponding to the matching entry is read from the for- 
warding database 140 (step 736). Otherwise, at step 734, the 
class action options are applied and processing continues 
with step 780. 

10 If the packet is a multicast packet (step 748), then the 
Time_To_Live (11 L) counter is tested against zero or one 
(step 750), otherwise processing continues at step 752. If 
TTL was determined to be zero or one, in step 750, then the 
packet is forwarded to the CPU 161 prior to continuing with 
step 780. Otherwise, at step 754, a destination address search 

15 is performed to retrieve an L2 forwarding entry from the 
. forwarding database 140 and the L2 decision algorithm is 
applied (step 756). 

If the packet was determined to be a unicast packet in step 
748, then TL is tested against zero or one (step 752). If TTL 

20 was determined to be zero or one, then the packet is 
forwarded to the CPU 161. Otherwise the L3 match is 
employed at step 760 and processing continues with step 
780. 

Specific processing for packets associated with headers 
25 classified as L3 includes steps 730, 740, 742, 762, 764, 766, 
744, 746, 768, and 770. At step 730, an L3 search is 
requested from the forwarding database 140. If a matching 
L3 entry is found (step 740), then the associated data 
corresponding to the matching entry is read from the for- 
30 warding database 140 (step 744). Otherwise, when no 
matching L3 entry is found, at step 742 a DA search is 
performed to find a matching L2 entry in the forwarding 
database 140. 

If the forwarding decision indicates the L2 decision 

35 should be used (step 762), then the L2 decision algorithm is 
applied at step 770. Otherwise, the class action options are 
applied (step 764). If the class action options indicate the 
packet is to be forwarded using the L2 results (step 766), 
then processing continues at step 770. Otherwise, the pro- 
cessing branches to step 780. 

At step 746, a destination address search is performed on 
the forwarding database 140 using the packet's destination 
address. If the forwarding decision indicates the L2 decision 
should be used (step 768), then processing continues with 
step 770. Otherwise, the associated data retrieved at step 744 

45 will be employed and processing continues with step 780. At 
step 770, the L2 decision algorithm is applied and process- 
ing continues with step 780. Finally, the forwarding decision 
is assembled (step 780). 
As illustrated by FIG. 7, packet processing for packets 

50 arriving on external links typically requires two to four 
associative lookups (i.e., two or more of the following: L2 
SA match, L2 learning, Unicast route class match, L2 DA 
match). However, according to an embodiment of the 
present invention, the L2 DA match may be eliminated 

55 whenever a port update access is needed for L2 learning. 
Thus, conserving valuable cycles. While the elimination of 
the L2 DA match may result in flooding one extra packet 
when a topology change occurs, the port update access is a 
relatively rare event. Advantageously, in this manner, the 

60 number of associative lookups is normally limited to a 
maximum of three per packet, without compromising func- 
tionality. 

FORWARDING DATABASE SEARCH 
65 SUPERCYCLE TIMING 

The search supercycle timing will now be described in 
view of the novel partitioning of forwarding information 
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within the forwarding database 140 and the pipelined for- database access commands are operable upon a specified 

warding database access. address that should be supplied by the CPU 161 prior to 

FIGS. 8A-C are timing diagrams illustrating the three issuing the command, 

worst case content addressable memory search supercycles. At step 920, after the CPU 161 has supplied the appro- 

Advantageously, the partitioning of data among the CAM- 5 P riate parameters for the command, the CPU issues the 

RAM architecture described with respect to FIG. 4 allows desired command. This may be accomplished by writing a 

forwarding database memory accesses to be pipelined. As command code corresponding to the desired command to a 

should be appreciated with reference to FIGS. 8A-C, the command register. 

switch fabric saves valuable cycles by hiding RAM reads According to the present embodiment, the CPU 161 L polls 

and writes within CAM accesses. For example, RAM reads 10 a status "S^Jf^* 6 command **ue d 111 step WO is 

and writes can be at least partially hidden within the slower com P le f te (step 930). Alternatively, since the commands have 

CAM accesses for each of the supercycles depicted. a predetermined maximum response ^Ume, the CPU 161 need 

„ , . „ , not poll the status register, rather the CPU 161 is free to 

Refernog now to FIG. 8A, a search supercycle including form other and may check me status register at 

an USAsearch and an UDAsearch is depicted Tlie first a ^ when the ^ expected to be complete. 

^ S ?°^ re P resen f ^ e , LZ SAsearch of the CAMs ^ ajtemative ^ t0 provide an interrupt mechanism for 

410 and 420 for purposes of L2 learning. As soon as the L2 me switcn f abric to notify tn6 cpjj 161 when the requ6st6d 

SA search has completed, the associated data in the SRAM command is complete. 

630 may immediately be updated (e.g., RAM read and RAM , aAn ^ .. ,. , , „ ml 

\ L-i i^,.w . * , rr-m t \ ' At step 940, after the command is complete, the CPU may 

write) while the next CAM short search (L2 DA search) is „„ ti \ ', U s*\ -ru*. „„„i... „,„., u„ '„ j ; „ ,, 

j. ' . v 20 ac ' on me results). Ine results may be provided in memory 

" " ' mapped registers in the software command execution block 

FIG. 8B illustrates a case in which L2 and L3 searches are 340j for examp i e . i n this case, the CPU 161 may retrieve the 

combined. The first CAM short search represents an U SA r esult(s) with a PIO read if necessary, 

search. The CAM long search represents a search of the At step 950, the issuance of the command by the CPU 1 61 

forwarding database 140 1 foi : a matchuig LJ entry. Again, triggers logic in the software command execution block 340, 

Tl»?£ T f ^ eUSAM « chlf is required, for , to load ^ riate 

tie SRAM read and wnte may be performed dunng the ^ comm!md pararn6 t 6rs are assumed to have been 

following CAM access If a matching U entry !S found, then previotts i y provide ^ by ^ cpu 161 at step 910. 

the RAM burst read of the associated data corresponding to At , A \. A iL / , L1 , - APt 

tU . . . , . a a- • H a At step 960, the software command execution block 340 

the matchuig entry can be performed during the second r ' . . r , . , 

^a^xl^ l * t -»taT l 30 issues the appropriate forwarding database memory specific 

CAM short search which represents an L2 DA search. _r ^ . j * i i *u* 

r command(s) to perform the requested task. In this manner, 

FIG. 8C illustrates another case in which L2 and L3 me cprj 161 reqilires nQ ^w^dge of me underlying raw 

searches are combined. However, m this case, the second instruction xt for me particular memory or memories used 

CAM access is not performed. tQ indent the forwarding database 140. 

It should be appreciated that the pipelining of the CAM 35 At step 970 upon completion of the forwarding database 

and SRAM effectively decouples the speed of the memories. 140 access> the software command execution block 340 

Further, the partitioning between the CAM(s) and the updates the results) in appropriate interf ace registers. Then, 

SRAM should now be appreciated. Because CAM accesses at step 980 me software command execution block 340 sets 

are slower than the accesses to the SRAM, it is desirable to one or more command status flag(s) to indicate to the CPU 

allocate as much of the forwarding information as possible 40 161 mat the CO m ma iid is complete. In other embodiments, 

to the SRAM. one or mo re additional status flags may be provided to 

Observing the gaps between the completion of the RAM indicate whether or not the command completed 

writes and the completion of the second CAM access, it is successfully, whether or not an error occurred, and/or other 

apparent that increasing the speed of the CAM(s) can reduce information that may be useful to the CPU 161. 

these gaps. The assignee of the present invention anticipates 4 5 Having described the general command processing flow, 

future technological developments to allow faster CAMs to ^ exemplary set of commands and their usage will now be 

be developed, thereby creating additional resources for described, 
additional or faster ports, for example. 

While only the pipelined forwarding database access is EXEMPLARY COMMAND SET 
illustrated in FIGS. 8A-C, it is important to note there are 50 According to the present embodiment, one or more corn- 
many other contributions to the overall speed of the switch mands may be provided for accessing entries in the forward- 
fabric 210 of the present invention. For example, as ing database 140. In particular, it may be useful to read a 
described above, the highly pipelined switch fabric logic newly learned Layer 2 (L2) entry. To retrieve an L2 entry, 
includes: pipelined header processing, pipelined forwarding the CPU 161 first programs counters in the switch fabric 210 
database access, and pipelined forwarding database/header 55 for addressing the forwarding database memory 140. 
processing. Subsequently, the CPU 161 writes the Read__CAM_Entry 

_ command to a command register in the switch fabric 210. 

GENERALIZED COMMAND PROCESSING it is me cp V > s turn t0 be by me switch fabric> 

Having described an exemplary environment in which the switch fabric will read the counters and perform access 

one embodiment of the present invention may be 60 the forwarding database memory 140 to retrieve the newly 

implemented, the general command processing will now be learned L2 entry. The switch fabric 210, then writes the L2 

described. FIG. 9 is a flow diagram illustrating generalized entry to an output register that is accessible by the CPU 161 

command processing for typical forwarding database and sets the command status done flag. After the command 

memory access commands according to one embodiment of is complete, and assuming the command was successful, the 

the present invention. At step 910, the CPU programs 65 CPU 161 may read the L2 entry from the output register, 

appropriate data registers in the software command execu- The Read_CAM_Entry command in combination with 

tion block 340 using PIOs. For example, certain forwarding the address counter register are especially useful for burst 
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reads in connection with updating the software's image of 
the entire forwarding database, for example. Because the 
hardware will automatically increment the address counter 
register at the completion of each memory access. The 
software only needs to program the address register prior to 5 
the first memory access. In this manner, the software may 
read the entire forwarding database 140 very efficiently. 
Similarly, it will be apparent that other forwarding memory 
accesses are also simplified such as sequences of writes 
during L3 entry initialization. The mechanism for writing Q 
entries to the forwarding database memory 140 will now be 
described. 

It is also convenient for the CPU 161 to be able to write 
an entry to the forwarding database memory. In particular, it 
may be useful to initialize all L3 entries in the forwarding 35 
database with a predetermined filler (or dummy) value. This 
command may also be useful for invalidation of L3 entries 
or before performing a mask update in a mask per bit (MPB) 
content associative memory (CAM), for example. A Write__ 
CAM_Entry command is provided for this purpose. Again, 20 
the CPU 161 should first program the appropriate counters 
in the switch fabric 210. The CPU 161 also provides the L3 
key to be written to the forwarding database memory 140. 
After these steps, the CPU 161 may issue the Write__CAM_ 
Entry command using a PIO write to the command register. ^ 
The CPU 161 may then begin polling the command status. 
The switch fabric 210 reads the parameters provided by the 
CPU 161 and initializes the corresponding L3 entry to a 
predetermined filler (or dummy). After the write is complete, 
the switch fabric 210 notifies the CPU 161 of the status of 30 
the command by setting the command status done flag. 

Commands may also be provided for accessing associated 
data. According to one embodiment of the present invention 
the following operations are provided: (1) learning a sup- 
plied address; (2) reading associated data corresponding to 35 
a supplied search key; (3) aging forwarding database entries; 
(4) invalidating entries; (5) accessing mask data, such as 
mask data that may be stored in a MPB CAM, corresponding 
to a particular search key; and (6) replacing forwarding 
database entries. 40 

L2 source address learning may be performed by a 
Learn_L2__SA command. First, the CPU 161 programs the 
appropriate registers in the switch fabric 210 with an L2 
search key and a new entry to insert or a modified entry. 
Then, CPU 161 issues the Learn__L2_SA command and 45 
begins polling the command status. The switch fabric 210 
reads the data provided by the CPU 161. If an entry is not 
found in the forwarding database 140 that matches the 
supplied address, then the new entry will be inserted into the 
forwarding database. After the insertion is complete or upon 50 
verifying a matching entry already exists, the switch fabric 
210 notifies the CPU 161 of the status of the command by 
setting the command status done flag. 

It is also convenient for the CPU 161 to be able to perform 
aging. In particular, it is useful to age L2 and L3 forwarding 55 
database entries. Age_SA and Age_NDA commands are 
provided for this purpose. The CPU 161 writes the appro- 
priate key and the modified age field to the switch fabric 
interface. Then, CPU 161 issues either the Age__SA com- 
mand or the Age DA command. The Age SA command 60 

sets the source address age field in the L2 entry correspond- 
ing to the provided search key. The Age DA command sets 

the destination address age field for the L2 or L3 entry 
corresponding to the provided search key. After issuing the 
command, the CPU 161 may begin polling the command 65 
status. The switch fabric 210 reads the data provided by the 
CPU 161 and updates the appropriate age field in the 



matching entry. After aging is complete, the switch fabric 
210 notifies the CPU 161 of the status of the command by 
setting the command status done flag. 

The CPU 161 may also need to have the ability to 
invalidate forwarding database entries such as aged L2 

entries, for example. The Invalidate_L2 Entry command is 

provided for this purpose. Prior to issuing the Invalidate_ 
L2„Entry command, the CPU 161 programs the appropriate 
address counters in the switch fabric 210. After issuing the 
command, the CPU 161 may begin polling the command 
status. The switch fabric 210 reads the data provided by the 
CPU 161 and resets the validity bit at the address counter 
location specified. After the entry invalidation is complete, 
the switch fabric 210 notifies the CPU 161 of the status of 
the command by setting the command status done flag. 

In embodiments employing MPB CAMs, typically the 
CAM stores alternating sets of data and masks. Each set of 
data has a corresponding mask. The masks allow program- 
mable selection of portions of data from the corresponding 
CAM line. Thus, it is convenient for the CPU 161 to be able 
to access the mask data corresponding to a particular address 
in the CAM. In particular, it is useful to update the mask data 
to select different portions of particular CAM lines. The 
Update_Mask command is provided for this purpose. The 
CPU 161 programs the address counter register and pro- 
grams the new mask into the appropriate register. Then, CPU 
161 issues the Update_Mask command and may begin 
polling the command status. The switch fabric 210 reads the 
parameters provided by the CPU 161 and updates the mask 
data corresponding to the specified address. After the mask 
data update is complete, the switch fabric 210 notifies the 
CPU 161 of the status of the command by setting the 
command status done flag. The CPU 161 may also read 
mask data in a similar fashion by employing a Read_Mask 
command and providing the appropriate address. 

Finally, it is desirable to be able to replace entries. 
Particularly, it is useful to replace filler (or dummy) L3 
entries with new valid L3 entries. The Replace_L3 com- 
mand is provided for this purpose. The CPU 161 provides an 
L3 search key to the switch fabric 210 and provides the new 
valid L3 entry. Then, the CPU 161 issues the Replace__L3 
command and may begin polling the command status. The 
switch fabric 210 reads the parameters provided by the CPU 
161 and performs a search of the forwarding database 140 
for the matching L3 entry. After locating the matching L3 
entry, the associated data corresponding to the matching 
entry is replaced with the new valid L3 entry provided by the 
CPU 161. After the L3 entry has been replaced, the switch 
fabric 210 notifies the CPU 161 of the status of the command 
by setting the command status done flag. 

Importantly, while embodiments of the present invention 
have been described with respect to specific commands and 
detailed steps for executing particular commands, those of 
ordinary skill in the art will appreciate that the present 
invention is not limited to any particular set of commands or 
sequence of execution. 

In the foregoing specification, the invention has been 
described with reference to specific embodiments thereof. It 
will, however, be evident that various modifications and 
changes may be made thereto without departing from the 
broader spirit and scope of the invention. For example, 
embodiments of the present invention have been described 
with reference to specific network protocols such as IP 
However, the method and apparatus described herein are 
equally applicable to other types of network protocols. The 
specification and drawings are, accordingly, to be regarded 
in an illustrative rather than a restrictive sense. 
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What is claimed is: 

1. A switch fabric comprising: 

a search engine for coupling to a forwarding database 
memory and a plurality of input ports, the search engine 
configured to schedule and perform accesses to the s 
forwarding database memory and to transfer forward- 
ing decisions to the plurality of input ports; and 

a header processing unit coupled to the search engine and 
having an arbitrated interface for coupling to the plu- 
rality of input ports, the header processing unit config- 10 
ured to receive a packet header from an input port of the 
plurality of input ports and to construct a first search 
key for accessing the forwarding database memory 
based upon a predetermined portion of the packet 
header, the predetermined portion of the packet header 
being selected based upon a class of a plurality of 
classes with which the packet header is associated. 

2. The switch fabric of claim 1, wherein the header 
processing unit further comprises the following pipeline 
stages: ^ 

an address accumulation unit, coupled to the plurality of 
input ports and the arbitrated interface, for accessing 
address information from the packet header; 

an encapsulation processing unit, coupled to the plurality 
of input ports and the arbitrated interface, for selecting 25 
a predetermined set of fields from the packet header to 
determine a type of encapsulation; 

a header class matching unit coupled to the plurality of 
input ports and the arbitrated interface, the header class 
matching including comparison logic to determine a 30 
header class based upon the type of encapsulation and 
a predetermined set of fields. 

3. The switch fabric of claim 1, where in the first search 
key is a Layer 3 (L3) search key. 

4. The switch fabric of claim 3, where the header pro- 35 
cessing unit is further configured to construct a Layer 2 (L2) 
search key for accessing the forwarding database memory. 

5. The switch fabric of claim 1, where in the first search 
key is a Layer 2 (L2) search key. 

6. The switch fabric of claim 1, wherein the plurality of 40 
input ports may each request for a forwarding decision 
independendy of the others. 

7. The switch fabric of claim 1, wherein the forwarding 
database memory comprises one or more content address- 
able memories (CAMs) coupled to a random access memory 45 
(RAM). 

8. The switch fabric of claim 1, further including a 
command execution unit configured to interface with a 
processor, the command execution unit further configured to 
access the forwarding database memory on behalf of the 50 
processor. 

9. The network device of claim 1, wherein the forwarding 
database memory comprises a first memory and a second 
memory and wherein the search engine is configured to 
pipeline accesses to the first memory and the second 55 
memory. 

10. The switch fabric of claim 3, wherein the L2 search 
key is of a first size and the L3 search key is of a second size, 
the second size being greater than the first size. 

11. A switch fabric comprising: 60 
a search engine for coupling to a forwarding database 

memory and a plurality of input ports, the search engine 
configured to schedule and perform accesses to the 
forwarding database memory and to transfer forward- 
ing decisions to the plurality of input ports; and $5 
a header processing unit coupled to the search engine and 
having an arbitrated interface for coupling to the phi- 
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rality of input ports, the header processing unit config- 
ured to receive a packet header from an input port of the 
plurality of input ports and to construct a first key and 
a second key for accessing a forwarding database 
memory, the first key comprising one or more fields 
from a first portion of the packet header and having a 
first length, the second key comprising one or more 
fields from a second portion of the packet header and 
having a second length. 

12. The switch fabric of claim 11, wherein the first key is 
a Layer 2 (L2) key for retrieving an L2 entry from the 
forwarding database memory, the second key is a Layer 3 
(L3) key for retrieving an L3 entry from the forwarding 
database memory. 

13. The switch fabric of claim 12, wherein the first length 
is smaller than the second length. 

14. The switch fabric of claim 12, wherein the first portion 
of the packet header comprises a media access control 
(MAC) header, and wherein the second portion of the packet 
header comprises an Internet Protocol (IP) header. 

15. The switch fabric of claim 11, wherein the forwarding 
database memory comprises one or more content address- 
able memories (CAMs) coupled to a random access memory 
(RAM). 

16. The switch fabric of claim 15, wherein an address for 
accessing the RAM includes an index produced by the one 
or more CAMs, and wherein the RAM contains both L2 and 
L3 forwarding information. 

17. The switch fabric of claim 11, wherein the packet 
header includes an L2 header and an L3 header, and wherein 
the header processing unit comprises pipelined logic to 
allow processing of more than one packet header 
simultaneously, the pipeline logic including: 

an encapsulation block configured to determine the type 
of header encapsulation that has been employed in a 
first packet header and to determine an indication of the 
start of the L3 header based upon the type of header 
encapsulation; and 

an L3 header class matching block coupled to the encap- 
sulation block for receiving the indication of the start of 
the L3 header, the L3 header class matching block 
configured to determine a class of a plurality of L3 
classes with which a second packet header is associated 
based upon one or more fields in the L2 header and the 
L3 header. 

18. A network device comprising: 

a plurality of ports including a first port for receiving a 
packet from a network; 

a forwarding memory including a first memory and a 
second memory, the first memory having stored therein 
an associative data entry corresponding to an the 
packet, the second memory coupled to the first memory 
and having stored therein an associated data entry 
corresponding to the associative data entry, the asso- 
ciative data entry including an indication of a set of 
ports to which the packet should be forwarded; and 

a search engine coupled to the plurality of ports and the 
forwarding memory, the search engine configured to 
schedule and perform accesses to the forwarding 
memory and to transfer the indication to the first port. 
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19. The network device of claim 18, wherein the search 
engine is configured to eliminate Layer 2 (L2) destination 
address (DA) matching whenever an L2 learning cycle is 
needed. 

20. The network device of claim 19, wherein the first 
memory comprises one or more content addressable memo- 
ries (CAMs), and wherein the second memory comprises a 
random access memory (RAM). 

21. The network device of claim 18, wherein the first 
memory and the second memory may be accessed in par- 
allel. 
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22. The network device of claim 21, wherein the search 
engine is configured to pipeline accesses to the first memory 
and the second memory. 

23. The network device of claim 22, wherein the first 
memory comprises one or more content addressable memo- 
ries (CAMs), and wherein the second memory comprises a 
random access memory (RAM). 
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